Automatic Text Correction for Devanagari OCR

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Learning for OCR Text Correction

The accuracy of Optical Character Recognition (OCR) is crucial to the success of subsequent applications used in text analyzing pipeline. Recent models of OCR post-processing significantly improve the quality of OCR-generated text, but are still prone to suggest correction candidates from limited observations while insufficiently accounting for the characteristics of OCR errors. In this paper, ...

متن کامل

An Efficient OCR Error Correction Method for Japanese Text Recognition

OCR error correction using Japanese morphological analysis contains two time-consuming procedures: extraction of candidate words from combinations of candidate characters, and finding the most plausible word sequence in combinations of the candidate words. In this paper an optimal word extraction technique, and the use of lexical entries that are tailored for Japanese verb inflection, are inves...

متن کامل

Automatic Reformatting of OCR Text from Biomedical Journal Articles

The goal of the Medical Article Record System (MARS), being developed by the National Library of Medicine, is to reduce the manual keyboard entry of bibliographic citation fields for the MEDLINE database by automatically identifying and converting information from bitmapped images of biomedical journal article pages to ASCII data. An important element of this automatic conversion requires refor...

متن کامل

A Statistical Approach to Automatic OCR Error Correction in Context

This paper describes an automatic, context-sensitive, word-error correction system based on statistical language modeling (SLM) as applied to optical character recognition (OCR) postprocessing. The system exploits information from multiple sources, including letter n-grams, character confusion probabilities, and word-bigram probabilities. Letter n-grams are used to index the words in the lexico...

متن کامل

Evaluating OCR and Non - OCR Text

In literature, many feature types and learning algorithms are proposed for document classiication. However , an extensive and systematic evaluation of the various approaches has not been done yet. In order to investigate diierent text representations for document classiication, we have developed a tool which transforms documents into feature-value representations suitable for standard learning ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Indian Journal of Science and Technology

سال: 2016

ISSN: 0974-5645,0974-6846

DOI: 10.17485/ijst/2016/v9i45/106372